charts/redpanda: Gateway API TLSRoute support for external access#1447
charts/redpanda: Gateway API TLSRoute support for external access#1447
Conversation
Adds a new external access mode using Gateway API TLSRoute resources
with SNI-based routing, enabling per-listener domain configuration
and removing the need for NodePort/LoadBalancer services.
Design:
- User provides their own Gateway; the chart only manages TLSRoutes
- Bootstrap ClusterIP service + per-broker ClusterIP services created
as TLSRoute backends
- Per-listener host/hostTemplate fields for SNI hostname configuration
- Each external listener gets a bootstrap TLSRoute and per-broker
TLSRoutes with unique SNI hostnames
- Default advertised port is 443 (configurable via gateway.advertisedPort)
- NodePort/LoadBalancer service generation is skipped in gateway mode
Example values:
external:
enabled: true
gateway:
enabled: true
parentRefs:
- name: kafka-gateway
sectionName: kafka
listeners:
kafka:
external:
default:
port: 9094
host: kafka.example.com
hostTemplate: kafka-$POD_ORDINAL.example.com
Closes #1361
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…TLSRoute - Transpile Go source to Helm templates via gotohelm - Define lightweight TLSRoute mirror types compatible with gotohelm (upstream Gateway API uses type aliases the transpiler cannot handle) - Register TLSRoute in the chart Scheme for test deserialization - Add gateway-api-tlsroute test case to template-cases.txtar - Regenerate all generated files (CRDs, RBAC, deepcopy, schemas) Golden test output confirms correct resource generation: - Bootstrap TLSRoute with SNI hostname pointing to bootstrap ClusterIP service - Per-broker TLSRoutes with interpolated hostnames pointing to per-broker services - ClusterIP services (bootstrap + per-pod) as TLSRoute backends - NodePort/LoadBalancer services correctly skipped in gateway mode Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Changes gateway mode from a global all-or-nothing switch to a per-listener opt-in. Each external listener can independently set `gateway: true` to use TLSRoute mode while other listeners remain on NodePort/LoadBalancer. This enables gradual migration: 1. Deploy Gateway, configure external.gateway with parentRefs 2. Add a new listener with gateway: true alongside existing ones 3. Migrate clients to the new TLSRoute-based listener 4. Remove the old NodePort/LoadBalancer listener Key changes: - Add Gateway *bool field to ExternalListener - ServicePorts() skips gateway-enabled listeners (NodePort/LB) - gatewayServicePorts() only includes gateway-enabled listeners - TLSRoutes only created for listeners with gateway: true - advertisedHostJSON checks per-listener gateway flag - Remove global NodePort/LB suppression guards - Add gateway-api-migration test case showing dual-mode operation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Rename example hostnames from kafka-*-broker to redpanda-*-broker in test cases and golden files - Add changie changelog entry for the Gateway API TLSRoute feature Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
All three CI failures had the same root cause: the chart's lightweight TLSRoute type was registered in the chart's Scheme but not in the operator's UnifiedScheme. This caused "no kind is registered for the type redpanda.TLSRoute" errors in: - TestFieldManagers / TestFieldManagersRegression (migration tests) - TestControllerRBAC (controller tests) - TestV2ResourceClient (lifecycle tests) - kuttl (operator binary fails to start) Fixes: - Register TLSRoute in the operator's V2Scheme and UnifiedScheme via scheme.go init() - Fix unparam lint: addTLSRouteToScheme no longer returns an always-nil error Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Revert protobuf files to match CI output (buf generate strips license headers that the license updater adds) - Revert zz_generated_status.go import ordering to match CI - Regenerate operator chart golden files to include the new gateway.networking.k8s.io/tlsroutes RBAC rule in all test cases Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add omitempty to GatewayParentRef.Name so the zero value is omitted
during JSON serialization, matching the generated PartialGatewayParentRef
which uses *string with omitempty. Fixes TestHelmValuesCompat fuzz test
that detected a round-trip mismatch: CRD emitted {"name":""} but the
partial expected {} for an empty GatewayParentRef.
- Fix gci import ordering in zz_generated.deprecations_test.go:
github.com/redpanda-data/common-go belongs in the third-party group.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use direct type conversion TLSRouteParentRef(ref) instead of field-by-field struct literal, as the linter's unconvert/gocritic fix recommends (identical field sets allow direct conversion). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The previous commit simplified toTLSRouteParentRefs to use a direct type conversion but did not regenerate the Helm template. The gotohelm transpiler now emits a simpler template that just copies the ref directly instead of field-by-field merging. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two fixes: 1. TestHelmValuesCompat: The fuzz test generates GatewayParentRef with Name=nil in the partial (*string), but the CRD type uses Name string (always serialized). Add a fixup callback to ensure Name is always non-nil in the fuzz input, matching the CRD's required field behavior. Revert the omitempty addition on both types since Name is required. 2. Golden files: After the toTLSRouteParentRefs type conversion change, the template no longer emits null fields (group, kind, namespace) for parent refs. Regenerated golden files to match. Verified locally: - TestHelmValuesCompat passes 10/10 runs - TestTemplate (all cases) passes - golangci-lint clean Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Previous k8s:generate run reverted the import fix. Reapplying: github.com/redpanda-data/common-go belongs in the third-party group. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
This PR is stale because it has been open 5 days with no activity. Remove stale label or comment or this will be closed in 5 days. |
|
This PR is stale because it has been open 5 days with no activity. Remove stale label or comment or this will be closed in 5 days. |
End-to-end test results (PASS)Validated this PR locally on a k3d cluster (HEAD Stack
What worked
The design works end-to-end: client connects to the bootstrap SNI hostname, Envoy passes TLS through to a broker, the broker advertises the per-broker hostname Minor papercuts hit during the test
🤖 Generated with Claude Code |
Summary
gateway: trueopt-in enables gradual migration — traditional and TLSRoute listeners coexisthost/hostTemplatefields enable unique SNI hostnames per external listener, solving the per-listener domain problem (Different domain per listener #1361)parentRefsDesign
The design follows the established pattern for TLSRoute-based access using Gateway API:
gateway: true, enabling gradual migrationUsing TLS Passthrough (recommended)
In passthrough mode, the Gateway forwards the TLS connection as-is to the Redpanda broker. Redpanda's own TLS certificate is used, and mTLS authentication works.
Step 1: Create a Gateway with TLS passthrough
Step 2: Configure Redpanda with gateway listeners
Step 3: Configure DNS
Point these DNS records to the Gateway's external IP:
redpanda.example.com→ Gateway IP (bootstrap)redpanda-0-broker.example.com→ Gateway IP (broker 0)redpanda-1-broker.example.com→ Gateway IP (broker 1)redpanda-2-broker.example.com→ Gateway IP (broker 2)Step 4: Connect clients
Using TLS Termination
In termination mode, the Gateway decrypts TLS and forwards plaintext to the broker. The Gateway's own certificate is presented to clients. mTLS authentication is not available in this mode.
Step 1: Create a Gateway with TLS termination
Step 2: Configure Redpanda without TLS on the external listener
Since the Gateway handles TLS, the Redpanda listener receives plaintext:
Migrating from Traditional Listeners to Gateway API
The per-listener
gateway: truefield enables zero-downtime migration. Traditional NodePort/LoadBalancer listeners and TLSRoute listeners coexist on different ports.Step 1: Deploy the Gateway
Create the Gateway resource in your cluster:
Step 2: Add a TLSRoute listener alongside the existing one
Update your Helm values to add a new listener with
gateway: true. The existing NodePort listener continues to work:After
helm upgrade, the cluster has both:Step 3: Configure DNS and migrate clients
redpanda.example.com:9094)Step 4: Remove the old listener
Once all clients have migrated, remove the NodePort listener:
Optionally remove
external.type: NodePortas it is no longer used.Verified end-to-end with Envoy Gateway +
rpkReproducible recipe used to validate this PR on a local k3d cluster, producing and consuming over TLS through the Gateway. Run from the
feat/gateway-api-tlsroutebranch.1. Cluster + dependencies
2. Create a Gateway (user-managed, the chart only attaches TLSRoutes)
3. Install the Redpanda chart with a gateway-mode listener
values.yaml:What the chart renders:
TLSRouterp-kafka-default-bootstrapredpanda.test.local→Service/rp-gateway-bootstrap:9094TLSRouterp-kafka-default-0redpanda-0.test.local→Service/gw-rp-0:9094Service(ClusterIP)rp-gateway-bootstrapService(ClusterIP)gw-rp-0rp-0Certificaterp-default-cert*.test.localandtest.local, covering both hostnamesVerify the routes are attached:
4. Connect with
rpkover TLSFor local testing without DNS, run
rpkin a container on the same docker network as the cluster, mapping the SNI hostnames to the Envoy data-plane IP via--add-host:Result on the validation run:
SNI-based routing confirmed in the Envoy data-plane access log:
requested_server_nameupstream_clusterredpanda.test.localtlsroute/rp-gw/rp-kafka-default-bootstrap/rule/-1redpanda-0.test.localtlsroute/rp-gw/rp-kafka-default-0/rule/-1So the design works end-to-end: bootstrap connection → bootstrap TLSRoute → bootstrap service; then the broker advertises
redpanda-0.test.local:9094; the client SNI-reconnects; Envoy routes by SNI to the per-broker TLSRoute → per-broker ClusterIP → broker pod, all under TLS Passthrough using Redpanda's own cert (no Gateway-side cert needed).Files changed
charts/redpanda/values.goGatewayConfig,GatewayParentReftypes; per-listenerGateway/Host/HostTemplatefields;ServicePorts()filters gateway listenerscharts/redpanda/service.gateway.gocharts/redpanda/tlsroute.gocharts/redpanda/chart.gocharts/redpanda/secrets.gocharts/redpanda/service.{loadbalancer,nodeport}.gooperator/api/.../redpanda_clusterspec_types.gooperator/.../redpanda_controller.gogateway.networking.k8s.io/tlsroutesOut of scope (future work)
Test plan
go build ./charts/redpanda/...go build ./operator/...gateway-api-tlsroute— all listeners on TLSRoutegateway-api-migration— NodePort + TLSRoute coexistingTestTemplatesuite passes with no regressionsrpkover TLS, SNI routing verified (recipe + results above)task generatein CICloses #1361
🤖 Generated with Claude Code